AITopics | improved exploration

Collaborating Authors

improved exploration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Neural Information Processing SystemsDec-24-2025, 20:52:22 GMT

adversarial mdp, improved exploration, policy optimization, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.59)

Add feedback

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Neural Information Processing SystemsJan-19-2025, 00:33:37 GMT

Policy optimization is a widely-used method in reinforcement learning. Due to its local-search nature, however, theoretical guarantees on global optimality often rely on extra assumptions on the Markov Decision Processes (MDPs) that bypass the challenge of global exploration. To eliminate the need of such assumptions, in this work, we develop a general solution that adds dilated bonuses to the policy update to facilitate global exploration. To showcase the power and generality of this technique, we apply it to several episodic MDP settings with adversarial losses and bandit feedback, improving and generalizing the state-of-the-art. When the number of states is infinite, under the assumption that the state-action values are linear in some low-dimensional features, we obtain \widetilde{\mathcal{O}}({T} {\frac{2}{3}}) regret with the help of a simulator, matching the result of Neu and Olkhovskaya [2020] while importantly removing the need of an exploratory policy that their algorithm requires.

assumption, improved exploration, policy optimization, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Thompson sampling for improved exploration in GFlowNets

Rector-Brooks, Jarrid, Madan, Kanika, Jain, Moksh, Korablyov, Maksym, Liu, Cheng-Hao, Chandar, Sarath, Malkin, Nikolay, Bengio, Yoshua

arXiv.org Artificial IntelligenceJun-30-2023

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2306.17693

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.49)
Education > Focused Education (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

PAC Optimal Planning for Invasive Species Management: Improved Exploration for Reinforcement Learning from Simulator-Defined MDPs

Dietterich, Thomas G. (Oregon State University) | Taleghan, Majid Alkaee (Oregon State University) | Crowley, Mark (Oregon State University)

AAAI ConferencesJul-9-2013

Often the most practical way to define a Markov Decision Process (MDP) is as a simulator that, given a state and an action, produces a resulting state and immediate reward sampled from the corresponding distributions. Simulators in natural resource management can be very expensive to execute, so that the time required to solve such MDPs is dominated by the number of calls to the simulator. This paper presents an algorithm, DDV, that combines improved confidence intervals on the Q values (as in interval estimation) with a novel upper bound on the discounted state occupancy probabilities to intelligently choose state-action pairs to explore. We prove that this algorithm terminates with a policy whose value is within epsilon of the optimal policy (with probability 1-delta) after making only polynomially-many calls to the simulator. Experiments on one benchmark MDP and on an MDP for invasive species management show very large reductions in the number of simulator calls required.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

AAAI Conferences

Twenty-Seventh AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback